Communication - Minimizing Algorithms for Matrix Multiplication

نویسنده

  • Jacob Scott
چکیده

As computers increase in speed, the proportion of time spent on communication between cache and hard drive or between multiple processors continues to rise. For single processors, data must be moved between the processor’s fast-access cache and main memory, an operation that often takes many orders of magnitude longer than any arithmetic operation. When multiple levels of cache are present, a cache-oblivious algorithm, one that will work effectively for any memory hierarchy, is desired, but designing such algorithms has proven challenging.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure

The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...

متن کامل

Minimizing the Communication Time for Matrix Multiplication on Multiprocessors

We present one matrix multiplication algorithm for two{dimensional arrays of processing nodes, and one algorithm for three{dimensional nodal arrays. One{dimensional nodal arrays are treated as a degenerate case. The algorithms are designed to utilize fully the communications bandwidth in high degree networks in which the one{, two{, or three{dimensional arrays may be embedded. For binary n-cube...

متن کامل

Minimizing Communication in Numerical Linear Algebra

In 1981 Hong and Kung proved a lower bound on the amount of communication (amount of data moved between a small, fast memory and large, slow memory) needed to perform dense, n-by-n matrix-multiplication using the conventional O(n3) algorithm, where the input matrices were too large to fit in the small, fast memory. In 2004 Irony, Toledo and Tiskin gave a new proof of this result and extended it...

متن کامل

Communication-optimal Parallel and Sequential Cholesky Decomposition

Numerical algorithms have two kinds of costs: arithmetic and communication, by which we mean either moving data between levels of a memory hierarchy (in the sequential case) or over a network connecting processors (in the parallel case). Communication costs often dominate arithmetic costs, so it is of interest to design algorithms minimizing communication. In this paper we first extend known lo...

متن کامل

Minimizing Loss of Information at Competitive PLIP Algorithms for Image Segmentation with Noisy Back Ground

In this paper, two training systems for selecting PLIP parameters have been demonstrated. The first compares the MSE of a high precision result to that of a lower precision approximation in order to minimize loss of information. The second uses EMEE scores to maximize visual appeal and further reduce information loss. It was shown that, in the general case of basic addition, subtraction, or mul...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014